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Abstract 

We present a novel approach to detecting and 
utilizing symmetries in probabilistic graph- 
ical models with two main contributions. 
First, we present a scalable approach to 
computing generating sets of permutation 
groups representing the symmetries of graph- 
ical models. Second, we introduce orbital 
Markov chains, a novel family of Markov 
chains leveraging model symmetries to re- 
duce mixing times. We establish an insightful 
connection between model symmetries and 
rapid mixing of orbital Markov chains. Thus, 
we present the first lifted MCMC algorithm 
for probabilistic graphical models. Both ana- 
lytical and empirical results demonstrate the 
effectiveness and efficiency of the approach. 

1 Introduction 

Numerous algorithms exploit model symmetries with 
the goal of reducing the complexity of the computa- 
tional problems at hand. Examples are procedures 
for detecting symmetries of first-order theories [7] and 
propositional formulas [2] in order to avoid the exhaus- 
tive exploration of a partially symmetric search space. 
More recently, symmetry detection approaches have 
been applied to answer set programming [11] and (in- 
teger) linear programming [26, 27, 34, 30]. A consider- 
able amount of attention to approaches utilizing model 
symmetries has been given by work on "lifted proba- 
bilistic inference [36, 9]." Lifted inference is mainly mo- 
tivated by the large graphical models resulting from 
statistical relational formalism such as Markov logic 
networks [38] . The unifying theme of lifted probabilis- 
tic inference is that inference on the level of instanti- 
ated formulas is avoided and instead lifted to the first- 
order level. Notable approaches are lifted belief propa- 
gation [41, 22], bisimulation-based approximate infer- 



ence algorithms [40] , first-order knowledge compilation 
techniques [44, 16], and lifted importance sampling ap- 
proaches [17]. With the exception of some results for 
restricted model classes [41, 44, 21], there is a some- 
what superficial understanding of the underlying prin- 
ciples of graphical model symmetries and the proba- 
bilistic inference algorithms utilizing such symmetries. 
Moreover, since most of the existing approaches are de- 
signed for relational models, the applicability to other 
types of probabilistic graphical models is limited. 

The presented work contributes to a deeper under- 
standing of the interaction between model symmetries 
and the complexity of inference by establishing a link 
between the degree of symmetry in graphical mod- 
els and polynomial approximability. We describe the 
construction of colored graphs whose automorphism 
groups are equivalent to those of the graphical models 
under consideration. We then introduce the main con- 
tribution, orbital Markov chains, the first general class 
of Markov chains for lifted inference. Orbital Markov 
chains combine the compact representation of symme- 
tries with generating sets of permutation groups with 
highly efficient product replacement algorithms. The 
link between model symmetries and polynomial mixing 
times of orbital Markov chains is established via a path 
coupling argument that is constructed so as to make 
the coupled chains coalesce whenever their respective 
states are located in the same equivalence class of the 
state space. The coupling argument applied to orbital 
Markov chains opens up novel possibilities of analyt- 
ically investigating classes of symmetries that lead to 
polynomial mixing times. 

Complementing the analytical insights, we demon- 
strate empirically that orbital Markov chains converge 
faster to the true distribution than state of the art 
Markov chains on well- motivated and established sam- 
pling problems such as the problem of sampling inde- 
pendent sets from graphs. We also show that exist- 
ing graph automorphism algorithms are applicable to 
compute symmetries of very large graphical models. 



2 Background and Related Work 

We begin by recalling some basic concepts of group 
theory and finite Markov chains both of which are cru- 
cial for understanding the presented work. In addition, 
we give a brief overview of related work utilizing sym- 
metries for the design of algorithms for logical and 
probabilistic inference. 

2.1 Group Theory 

A symmetry of a discrete object is a structure- 
preserving bijection on its components. For instance, 
a symmetry of a graph is a graph automorphism. 
Symmetries are often represented with permutation 
groups. A group is an abstract algebraic structure 
(25, o), where 25 is a set closed under a binary associa- 
tive operation o such that there is a identity element 
and every element has a unique inverse. Often, we re- 
fer to the group 25 rather than to the structure (25, o). 
We denote the size of a group 25 as |25|. A permuta- 
tion group acting on a finite set ft is a finite set of 
bijections g : ft — > ft that form a group. 

Let ft be a finite set and let 25 be a permutation group 
acting on ft. If a € ft and g e we write a B to de- 
note the image of a under g. A cycle (a± a-i ... a n ) 
represents the permutation that maps a± to a,^, «2 to 
a3,..., and a n to oi\. Every permutation can be writ- 
ten as a product of disjoint cycles where each element 
that does not occur in a cycle is understood as being 
mapped to itself. We define a relation ~ on ft with 
a ~ j3 if and only if there is a permutation g £ 25 such 
that a° = /?. The relation partitions ft into equiva- 
lence classes which we call orbits. We use the notation 
q® to denote the orbit {a B | g £ ©} containing a. Let 
/ : ft — » K be a function from ft into the real numbers 
and let 25 be a permutation group acting on ft. We 
say that 25 is an automorphism group for (ft, /) if and 
only if for all uj £ ft and all g £ 25, f(u) = /(w fl ). 

2.2 Finite Markov chains 

Given a finite set ft a finite Markov chain defines a ran- 
dom walk (Ao, Xi, ...) on elements of ft with the prop- 
erty that the conditional distribution of X n+ i given 
(Xq, Xi, X n ) depends only on X n . For all x,y £ ft 
P{x,y) is the chain's probability to transition from x 
to y, and P t (x, y) — P^.(y) the probability of being in 
state y after t steps if the chain starts at a;. A Markov 
chain is irreducible if for all x, y £ ft there exists a t 
such that P t (x,y) > and aperiodic if for all x £ ft, 
gcd{i > 1 | P l (x,x) > 0} = 1. A chain that is both 
irreducible and aperiodic converges to its unique sta- 
tionary distribution. 

The total variation distance d tv of the Markov chain 



from its stationary distribution n at time t with initial 
state x is defined by 

yen 

For e > 0, let r x (e) denote the least value T such that 
dtv(P*, 7r) < e for all t >T. The mixing time r(e) is 
defined by r(e) = max.{T x (e) | x £ ft}. We say that 
a Markov chain is rapidly mixing if the mixing time is 
bounded by a polynomial in n and log(e _1 ), where n 
is the size of each configuration in ft. 

2.3 Symmetries in Logic and Probability 

Algorithms that leverage model symmetries to solve 
computationally challenging problems more efficiently 
exist in several fields. Most of the work is related to 
the computation of symmetry breaking predicates to 
improve SAT solver performance [7, 2]. The construc- 
tion of our symmetry detection approach is largely 
derived from that of symmetry detection in preposi- 
tional theories [7, 2]. More recently, similar symme- 
try detection approaches have been put to work for 
answer set programming [11] and integer linear pro- 
gramming [34]. Poole introduced the notion of lifted 
probabilistic inference as a variation of variable elim- 
ination taking advantage of the symmetries in graph- 
ical models resulting from probabilistic relational for- 
malisms [36]. Following Poole's work, several algo- 
rithms for lifted probabilistic inference were developed 
such as lifted and counting belief propagation [41, 22], 
bi-simulation-based approximate inference [40], gen- 
eral purpose MCMC algorithm for relational mod- 
els [29] and, more recently, first-order knowledge com- 
pilation techniques [44, 16]. In contrast to existing 
methods, we present an approach that is applicable to 
a much larger class of graphical models. 

3 Symmetries in Graphical Models 

Similar to the method of symmetry detection in prepo- 
sitional formulas [7, 2, 8] we can, for a large class 
of probabilistic graphical models, construct a col- 
ored undirected graph whose automorphism group 
is equivalent to the permutation group represent- 
ing the model's symmetries. We describe the ap- 
proach for sets of partially weighted propositional for- 
mulas since Markov logic networks, factor graphs, 
and the weighted model counting framework can be 
represented using sets of (partially) weighted formu- 
las [38, 44, 16]. For the sake of readability, we describe 
the colored graph construction for partially weighted 
clauses. Using a more involved transformation, how- 
ever, we can extend it to sets of partially weighted 
formulas. Let S = {(fi,Wi)},l < i < n, be a set of 




Figure 1: The colored graph resulting from the set of 
weighted clauses of Example 3.1. 

partially weighted clauses with w; G R if /, is weighted 
and Wi — oo otherwise. We define an automorphism 
of S as a permutation mapping (a) unnegated vari- 
ables to unnegated variables, (b) negated variables to 
negated variables, and (c) clauses to clauses, respec- 
tively, such that this permutation maps S to an iden- 
tical set of partially weighted clauses. The set of these 
permutations forms the automorphism group of S. 

The construction of the colored undirected graph 
G(S) = (V, E) proceeds as follows. For each variable 
a occurring in S we add two nodes v a and v^ a model- 
ing the unnegated and negated variable, respectively, 
to V and the edge {v a ,v^ a } to E. We assign color 
(1) to nodes corresponding to negated (unnegated) 
variables. This coloring precludes permutations that 
map a negated variable to an unnegated one or vice 
versa. We introduce a distinct color c x for unweighted 
clauses and a color c w for each distinct weight w oc- 
curring in S. For each clause /j with weight Wi = w 
we add a node Vf i with color c w to V. For each un- 
weighted clause fi we add a node Vf t with color Coo to 
V. Finally, we add edges between each clause node Vf i 
and the nodes of the negated and unnegated variables 
occurring in /j. Please note that we can incorporate 
evidence by introducing two novel and distinct colors 
representing true and false variable nodes. 

Example 3.1. Let {f 1 := (a V ->c, 0.5), / 2 := (b V 
-ic, 0.5)} be a set of weighted clauses. We introduce 6 
variable nodes v a) Vb, v c , V-, a , v^b, v~, c where the former 
three have color 1 (green) and the latter three color 
(red). We connect the nodes v a and v^ a ; Vb and v-,b] 
and v c and v nc . We then introduce two new clause 
nodes Vf ± ,Vf 2 both with color 2 (yellow) since they 
have the same weight. We finally connect the variable 
nodes with the clause nodes they occur in. Figure 1 
depicts the resulting colored graph. A generating set of 
Aut(G(iS)), the automorphism group of this particular 
colored graph, is {(v a v b )(v^ a v^ b )(vf 1 v h )}. 

The following theorem states the relationship between 
the automorphisms of S and the colored graph G(S). 

Theorem 3.2. Let S = {(fi, Wi)}, 1 < i < n, be a set 

of partially weighted clauses and let Aut(G(<S)) be the 
automorphism group of the colored graph constructed 
for S. There is a one-to-one correspondence between 
Aut(G(6>)) and the automorphism group ofS. 



Given a set of partially weighted clauses S with vari- 
ables X we have, by Theorem 3.2, that if we define a 
distribution Pr over random variables X with features 
fi and weights tUj, 1 < i < n, then Aut(G(<S)) is an 
automorphism group for ({0, l} x ,Pr). Hence, we can 
use the method to find symmetries in a large class of 
graphical models. The complexity of computing gen- 
erating sets of Aut(G(5)) is in NP and not known to 
be in P or NP-complete. For graphs with bounded 
degree the problem is in P [25]. There are special- 
ized algorithms for finding generating sets of auto- 
morphism groups of colored graphs such as Saucy[8] 
and Nauty[28] with remarkable performance. We will 
show that Saucy computes irredundant sets of gen- 
erators of automorphism groups for graphical models 
with millions of variables. The size of these generating 
sets is bounded by the number of graph vertices. 

We briefly position the symmetry detection approach 
in the context of existing algorithms and concepts. 

3.1 Lifted Message Passing 

There are two different lifted message passing algo- 
rithms. Lifted First-Order Belief Propagation [41] op- 
erates on Markov logic networks whereas Counting Be- 
lief Propagation [22] operates on factor graphs. Both 
approaches leverage symmetries in the model to par- 
tition variables and features into equivalence classes. 
Each variable class (supernode/clusternode) contains 
those variable nodes that would send and receive the 
same messages were (loopy) belief propagation (BP) 
run on the original model. Each feature class (super- 
feature/clusterfactor) contains factor nodes that would 
send and receive the same BP messages. 

The colored graph construction provides an alterna- 
tive approach to partitioning the variables and fea- 
tures of a graphical model. We simply compute 
the orbit partition induced by the permutation group 
Aut(G(6>)) acting on the set of variables and features. 
For instance, the orbit partition of Example 3.1 is 
{{a, b}, {c}, /b}}- In general, orbit partitions have 
the following properties: For two variables v\ , V2 in the 
same orbit we have that (a) v\ and V2 have identical 
marginal probabilities and (b) the variable nodes cor- 
responding to V\ and w 2 would send and receive the 
same messages were BP run on the original model; 
and for two features f\ and fi in the same orbit we 
have that the factor nodes corresponding to f\ and fi 
would send and receive the same BP messages. 

3.2 Finite Partial Exchangeability 

The notion of exchangeability was introduced by de 
Finetti [14]. Several theorems concerning finite (par- 
tial) exchangeability have been stated [10, 14]. Given 



a finite sequence of n binary random variables X, we 
say that X is exchangeable with respect to the distri- 
bution Pr if, for every x G {0, 1}™ and every permu- 
tation g acting on {0, 1}™, we have that Pr(X = x) = 
Pr(X = x 3 ). This is equivalent to saying that the sym- 
metric group Sym(n) is an automorphism group for 
({0, 1}™, Pr). Whenever we have finite exchangeability, 
there are n + 1 orbits each containing the variable as- 
signments with Hamming weight i, < i < n. Hence, 
every exchangeable probability distribution over n bi- 
nary random variables is a unique mixture of draws 
from the n + 1 orbits. In some cases of partial ex- 
changeability, namely when the orbits can be specified 
using a statistic, one can use this for a more compact 
representation of the distribution as a product of mix- 
tures [10]. The symmetries that have to be present for 
such a re-parameterization to be feasible, however, are 
rare and constitute one end of the symmetry spectrum. 

Therefore, a central question is how arbitrary symme- 
tries, compactly represented with irredundant genera- 
tors of permutation groups, can be utilized for efficient 
probabilistic inference algorithms that go beyond (a) 
single variable marginal inference via lifted message 
passing and (b) the limited applicability of finite par- 
tial exchangeability. In order to answer this question, 
we turn to the major contribution of the present work. 

4 Orbital Markov Chains 

Inspired by the previous observations, we introduce or- 
bital Markov chains, a novel family of Markov chains. 
An orbital Markov chain is always derived from an 
existing Markov chain so as to leverage the symme- 
tries in the underlying model. In the presence of 
symmetries orbital Markov chains are able to perform 
wide-ranging transitions reducing the time until con- 
vergence. In the absence of symmetries they are equiv- 
alent to the original Markov chains. Orbital Markov 
chains only require a generating set of a permutation 
group © acting on the chain's state space as additional 
input. As we have seen, these sets of generators are 
computable with graph automorphism algorithms. 

Let f2 be a finite set, let M! = (Xq,X[, ...) be a 
Markov chain with state space f2, let tt be a station- 
ary distribution of M! , and let © be an automor- 
phism group for (f2,7r). The orbital Markov chain 
M = (X ,Xi, ...) for M' is a Markov chain which 
at each integer time t + 1 performs the following steps: 

1. Let X' t+1 be the state of the original Markov chain 
M! at time t + 

2. Sample X t +\, the state of the orbital Markov 
chain M. at time t + 1, uniformly at random from 
X' t+1 , the orbit of X' t+1 . 



The orbital Markov chain A4, therefore, runs at every 
time step t > 1 the original chain A4' first and samples 
the state of M. at time t uniformly at random from the 
orbit of the state of the original chain M! at time t. 

First, let us analyze the complexity of the second step 
which differs from the original Markov chain. Given a 
state X t and a permutation group © we need to sam- 
ple an element from Xt , the orbit of X t , uniformly at 
random. By the orbit-stabilizer theorem this is equiv- 
alent to sampling an element g G © uniformly at ran- 
dom and computing X t B . Sampling group elements 
nearly uniform at random is a well-researched prob- 
lem [6] and computable in polynomial time in the size 
of the generating sets with product replacement algo- 
rithms [35] . These algorithms are implemented in sev- 
eral group algebra systems such as Gap [15] and exhibit 
remarkable performance. Once initialized, product re- 
placement algorithms can generate pseudo-random el- 
ements by performing a small number of group multi- 
plications. We could verify that the overhead of step 
2 during the sampling process is indeed negligible. 

Before we analyze the conditions under which orbital 
Markov chains are aperiodic, irreducible, and have the 
same stationary distribution as the original chain, we 
provide an example of an orbital Markov chain that 
is based on the standard Gibbs sampler which is com- 
monly used to perform probabilistic inference. 

Example 4.1. Let V be a finite set of random vari- 
ables with probability distribution tt, and let © be 
an automorphism group for (xy e vV,ir). The orbital 
Markov chain for the Gibbs sampler is a Markov chain 
M, = (Xq,Xx, ...) which, being in state X t , performs 
the following steps at time t+1: 

1. Select a variable V G V uniformly at random; 

2. Sample X' t+1 (V), the value of V in the config- 
uration X' t+1 , according to the conditional tt- 
distribution of V given that all other variables 
take their values according to X t ; 

3. Let X' t+1 {W) = X t (W) for all variables W G V \ 
{V}; and 

4. Sample X t +i from X'f +1 , the orbit of X' t+1 , uni- 
formly at random. 

We call this Markov chain the orbital Gibbs sampler. 
In the absence of symmetries, that is, if ©'s only el- 
ement is the identity permutation, the orbital Gibbs 
sampler is equivalent to the standard Gibbs sampler. 

Let us now state a major result of this paper. It relates 
properties of the orbital Markov chain to those of the 
Markov chain it is derived from. A detailed proof can 
be found in the appendix. 
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Figure 2: An undirected graphical model over two 
binary variables and one symmetric potential function 
with corresponding distribution shown on the right. 
There are three orbits each indicated by one of the 
rounded rectangles. 



by definition of the standard Gibbs sampler, it can- 
not transition directly to the state 01. The chain is 
"stuck" in the state 10 until it is able to move to 11 
or 00. Now, consider the orbital Gibbs sampler. Intu- 
itively, while it is "waiting" to move to one of the low 
probability states, it samples the two high probability 
states horizontally uniformly at random from the orbit 
{01, 10}. In this particular case the orbital Gibbs sam- 
pler converges faster than the standard Gibbs sampler, 
a fact that we will also show analytically. 



Theorem 4.2. Let 51 be a finite set and let M! be a 

Markov chain with state space VL and transition matrix 
P' . Moreover, let tt be a probability distribution on ft, 
let © be an automorphism group for (f2, tt), and let M. 
be the orbital Markov chain for M.' . Then, 

(a) if M! is aperiodic then M is also aperiodic; 

(b) if M.' is irreducible then M. is also irreducible; 

(c) if it is a reversible distribution for M! and, for all 
q € 25 and all x,y € ft we have that P'{x,y) — 
P'(x B ,y s ), then tt is also a reversible and, hence, 
a stationary distribution for M. . 

The condition in statement (c) requiring for all g £ & 
and all x, y € fi that P'(x,y) = P'(x 3 ,y s ) conveys 
that the original Markov chain is compatible with the 
symmetries captured by the permutation group ©. 
This rather weak assumption is met by all of the prac- 
tical Markov chains we are aware of and, in particular, 
Metropolis chains and the standard Gibbs sampler. 

Corollary 4.3. Let A4' be the Markov chain of the 
Gibbs sampler with reversible distribution tt. The or- 
bital Gibbs sampler for M! is aperiodic and has tt as a 
reversible and, hence, a stationary distribution. More- 
over, if M! is irreducible then the orbital Gibbs sam- 
pler is also irreducible and it has tt as its unique sta- 
tionary distribution. 

We will show both analytically and empirically that, in 
the presence of symmetries, the orbital Gibbs sampler 
converges at least as fast or faster to the true distribu- 
tion than state of the art sampling algorithms. First, 
however, we want to take a look at an example that 
illustrates the advantages of the orbital Gibbs sampler. 

Example 4.4. Consider the undirected graphical 
model in Figure 2 with two binary random variables 
and a symmetric potential function. The probabilities 
of the states 01 and 10 are both 0.49. Due to the sym- 
metry in the model, the states 10 and 01 are part of 
the same orbit. Now, let us assume a standard Gibbs 
sampler is in state 10. The probability for it to tran- 
sition to one of the states 11 and 00 is only 0.02 and, 



4.1 Mixing Time of Orbital Markov Chains 

We will make our intuition about the faster conver- 
gence of the orbital Gibbs sampler more concrete. We 
accomplish this by showing that the more symmetry 
there is in the model the faster a coupling of the orbital 
Markov chain will coalesce and, therefore, the faster 
the chain will converge to its stationary distribution. 

There are several methods available to prove rapid 
mixing of a finite Markov chain. The method we will 
use here is that of a coupling. A coupling for a Markov 
chain A4 is a stochastic process (X t , Y t ) on £1 x SI such 
that (X t ) and (Y t ) considered marginally are faithful 
copies of M. The coupling lemma expresses that the 
total variation distance of M at time t is limited from 
above by the probability that the two chains have not 
coalesced, that is, have not met at time t (see for in- 
stance Aldous [1]). Coupling proofs on the joint space 
Cl x Q are often rather involved and require complex 
combinatorial arguments. A possible simplification is 
provided by the path coupling method where a cou- 
pling is only required to hold on a subset of ft x il (Bub- 
ley and Dyer [4]). The following theorem formalizes 
this idea. 

Theorem 4.5 (Dyer and Greenhill [12]). Let 5 be an 

integer valued metric defined onQxtt taking values in 
{0, ...,£>}. Let S C Q, x fl such that for all (X t ,Y t ) e 
Q x Q there exists a path X t = Z a , Z r — Y t between 
X t and Y t with (Zi, Zi + \) G S for < I < r and 
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Define a coupling (X, Y) (X',Y') of the Markov 
chain M. on all pairs (X, Y) £ S. Suppose there exists 
(3 < 1 with E[S(X',Y')] < (3S(X,Y) for all (X, Y) e 
S. If j3 < 1 then the mixing time r(e) of M satisfies 

If (3 — 1 and there exists an a > such that 
Pr[6(X t+1 ,Y t+1 ) S(Xt,Y t )} > a for all t, then 

~eD 2 ~ 



r(e)< 
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We selected the insert/delete Markov chain for inde- 
pendent sets of graphs for our analysis. Sampling in- 
dependent sets is a classical problem motivated by nu- 
merous applications and with a considerable amount 
of recent research devoted to it [24, 13, 45, 42, 37]. The 
coupling proof for the orbital version of this Markov 
chain provides interesting insights into the construc- 
tion of such a coupling and the influence of the graph 
symmetries on the mixing time. The proof strategy is 
in essence applicable to other sampling algorithms. 

Let G = (V, E) be a graph. A subset A of V is an 
independent set if {v,w} (£ E for all v,w € X. Let 
2(G) be the set of all independent sets in a given graph 
G and let A be a positive real number. The partition 
function Z = Z(\) and the corresponding probability 
measure n\ on 1(G) are defined by 

Z = Z(X)= xlXl and 7F a(A) = — . 

xex(G) 

Approximating the partition function and sampling 
from 1(G) can be accomplished using a rapidly mix- 
ing Markov chain with state space 1(G) and stationary 
distribution 7r\. The simplest Markov chain for inde- 
pendent sets is the so-called insert/delete chain [13]. 
If X t is the state at time t then the state at time t + 1 
is determined by the following procedure: 

1. Select a vertex v £ V uniformly at random; 

2. If v <E Xt then let X t +i = X t \{v} with probability 
1/(1 + A); 

3. If v X t and v has no neighbors in X t then let 
X t+ i = X t U {v} with probability A/(l + A); 

4. Otherwise let X t +i = X t . 

Using a path coupling argument one can show that the 
insert/delete chain is rapidly mixing for A < 1/(A — 1) 
where A is the maximum degree of the graph [13]. 
We can turn the insert/delete Markov chain into the 
orbital insert/delete Markov chain M(2(G)) simply 
by adding the following fifth step: 

5. Sample A t+1 uniformly at random from its orbit. 

By Corollary 4.3 the orbital insert/delete chain for in- 
dependent sets is aperiodic, irreducible, and has tt\ as 
its unique stationary distribution. We can now state 
the following theorem concerning the mixing time of 
this Markov chain. It relates the graph symmetries to 
the mixing time of the chain. The proof of the theorem 
is based on a path coupling that is constructed so as 
to make the two chains coalesce whenever their respec- 
tive states are located in the same orbit. A detailed 
and instructive proof can be found in the appendix. 
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Figure 3: Two independent sets X U {v} and X U {w} 
of the 4x4 grid located in the same orbit. The first 
permutation is the reflection on the sketched diagonal 
the second a clockwise 90° rotation. 



Theorem 4.6. Let G = (V, E) be a graph with maxi- 
mum degree A, let X be a positive real number, and let 
& be an automorphism group for ({0, 1} ,X\). More- 
over, let (X U {v}) G 1(G), let (X U {w}) G 1(G), let 
{v,w} G E, and let p = Pr[(lU{i;}) £ (X U {w}) & ]. 
The orbital insert/delete chain M(2(G)) is rapidly 
mixing if either p < 0.5 or A < l/((2p— 1)A — 1). 

The theorem establishes the important link between 
the graph automorphisms and the mixing time of the 
orbital insert/delete chain. The more symmetries the 
graph exhibits the larger the orbits and the sooner the 
chains coalesce. Figure 3 depicts the 4x4 grid with two 
independent sets AU{v} and XU{w} with {v, w} G E 
and (X U {v}) e(lU {w}) & . Since p < 1 for nxn 
grids, n > 4, we can prove (a) rapid mixing of the 
orbital insert/delete chain for larger A values and (b) 
more rapid mixing for identical A values. 

The next corollary follows from Theorem 4.6 and the 
simple fact that X' G X & for all icp'cy with 
| A" | = | A' | whenever © is the symmetric group on V . 

Corollary 4.7. Let G = (V, E) be a graph, let A be 
a positive real number, and let be an automorphism 
group for ({0,1} V ,tt\). If & is the symmetric group 
Sym(V) then /A (2(G)) is rapidly mixing with r(e) < 
\V\ HlVle- 1 ). 

By analyzing the coupling proof of Theorem 4.6 and, 
in particular, the moves leading to states (A', Y') with 
\X'\ = \Y'\ and the probability that A' and Y' are 
located in the same orbit in these cases, it is possible 
to provide more refined bounds. Moreover, to capture 
the full power of orbital Markov chains, a coupling 
argument should not merely consider pairs of states 
with Hamming distance 1. Indeed, the strength of the 
orbital chains is that, in the presence of symmetries 
in the graph topology, there is a non-zero probability 
that states with large Hamming distance (up to |V|) 
are located in the same orbit. The method presented 
here is also applicable to Markov chains known to mix 
rapidly for larger A values than the insert / delete chain 
such as the insert / delete / drag chain [13]. 
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Table 1: Number of vertices and edges of the colored 
graphs, the runtime of Saucy, and the number of 
(super-)features of the social network model without 
and with 10% evidence. 




Figure 4: From left to right: the 3- grid, the 3- 
connected cliques, and the 3-complete graph models. 

5 Experiments 

Two graphical models were used to evaluate the sym- 
metry detection approach. The "Friends & Smok- 
ers" Markov logic network where for a random 10% 
of all people it is known (a) whether they smoke or 
not and (b) who 10 of their friends are [41]. More- 
over, we used the kxk grid model, an established and 
well-motivated lattice model with numerous applica- 
tions [37]. All experiments were conducted on a PC 
with an AMD Athlon dual core 5400B 1.0 GHz proces- 
sor and 3 GB RAM. Table 1 lists the results for varying 
model sizes. Saucy's runtime scales roughly quadratic 
with the number of vertices and it performs better for 
the kxk grid models. This might be due to the larger 
sets of generators for the permutation groups of the 
social network model. Table 1 also lists the number of 
features of the ground social network model (features) , 
the number of feature orbits without (orbs w/o) and 
with (orbs w/) 10% evidence. 

We proceeded to compare the performance of the or- 
bital Markov chains with state-of-the-art algorithms 
for sampling independent sets. We used Gap[15], a 
system for computational discrete algebra, and the 
Orb packagepl] 1 to implement the sampling algo- 
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Figure 5: The results of the three Gibbs samplers for 
the 5-grid model (top) and the Q-grid model (bottom). 

rithms. The experiments can easily be replicated by 
installing Gap and the Orb package and by running 
the Gap files available at a dedicated code repository 2 . 
For the evaluation of the sampling algorithms we se- 
lected three different graph topologies exhibiting vary- 
ing degrees of symmetry: 

The k-grid model is the 2-dimensional kxk grid. 
An instance of the model for k — 3 is depicted 
in Figure 4 (left). Here, the generating set of 
the permutation group © computed by Saucy is 
{(a c)(d f)(g i), (a i)(b f)(d h)} and |<8| = 8. The per- 
mutation group partitions the set {0, l} 9 in 102 or- 
bits with each orbit having a cardinality in {1, 2, 4, 8}. 

The k-connected cliques model is a graph with k + 1 
distinct cliques each of size k — 1 and each con- 
nected with one edge to the same vertex. Statisti- 
cal relational formalisms such as Markov logic net- 
works often lead to similar graph topologies. An in- 



1 http:/ /www. gap-system.org/Packages/orb. html 



2 http:/ /code. google. com/p/lifted-mcmc/ 
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Figure 6: The results of the three Gibbs samplers for 
the ^-connected cliques model. 



Figure 7: The results of the three Gibbs samplers for 
the ^-complete graph model. 



stance for k = 3 is depicted in Figure 4 (center). 
Here, the generating set of © computed by Saucy is 
{(a g)(b /), (a c)(b d), (a i)(b h)} and |<8| = 24. The 
permutation group © partitions the set {0, l} 9 in 70 
orbits with cardinalities in {1, 4, 6, 12, 24}. 

The k-complete graph model is a complete graph with 
k 2 vertices. Figure 4 (right) depicts an instance for k = 
3. Here, the generating set of © computed by Saucy 
is {{b c), (6 d), (b e), (b /), (6 g), (b h), (b i), (a b)} and 
|<S| = 9! = 362880. The permutation group <S parti- 
tions the set {0, l} 9 in 10 orbits with each orbit having 
a cardinality in {1,9,36,84,126}. 

Saucy needed at most 5 ms to compute the sets of gen- 
erators for the permutation groups of the three models 
for k = 6. We generated samples of the probability 
measure ir\ on 1(G) for A = 1 and the three differ- 
ent graph topologies by running (a) the insert/delete 
chain, (b) the insert/delete/drag chain [13], and (c) 
the orbital insert/delete chain. Each chain was started 
in the state corresponding to the empty set and no 
burn-in period was used. The orbital insert/delete 
chain did not require more RAM and needed 50 mi- 
croseconds per sample which amounts to an overhead 
of about 25% relative to the 40 microseconds of the 
insert/delete chain. The 25% overhead remained con- 
stant and independent of the size of the graphs. Since 
the sampling algorithms create large files with all ac- 
cumulated samples, I/O overhead is included in these 
times. For each of the three topologies and each of the 
three Gibbs samplers, we computed the total variation 
distance between the distribution approximated using 
all accumulated samples and the true distribution -k\. 
Figure 5 plots the total variation distance over elapsed 
time for the k-grid model for k = 5 and k = 6. The 



orbital insert/delete chain (Orbital Gibbs) converges 
the fastest. The insert/delete/drag chain (Drag Gibbs) 
converges faster than the insert/delete chain (Gibbs) 
but since there is a small computational overhead of 
the insert/delete/drag chain the difference is less pro- 
nounced for k — 6. The same results are observable 
for the other graph topologies (see Figures 6 and 7) 
where the orbital Markov chain outperforms the oth- 
ers. In summary, the larger the cardinalities of the 
orbits induced by the symmetries the faster converges 
the orbital Gibbs sampler relative to the other chains. 

6 Discussion 

The mindful reader might have recognized a similarity 
to lumping of Markov chains which amounts to parti- 
tioning the state space of the chain [5] . Computing the 
coarsest lumping quotient of a Markov chain with a bi- 
simulation procedure is linear in the number of non- 
zero probability transitions of the chain and, hence, 
in most cases exponential in the number of random 
variables. Since merely counting equivalence classes 
in the Polya theory setting is a #P-complete prob- 
lem [18] there are clear computational limitations to 
this approach. Orbital Markov chains, on the other 
hand, combine the advantages of a compact represen- 
tation of symmetries as generating sets of permutation 
groups with highly efficient product replacement algo- 
rithms and, therefore, provide the advantages of lump- 
ing while avoiding the intractable explicit computation 
of the partition of the state space. 

One can apply orbital Markov chains to other graph- 
ical models that exhibit symmetries such as the Ising 
model. Since Markov chains in general and Gibbs sam- 
plers in particular are components in numerous algo- 



rithms (cf. [43, 20, 38, 19, 23, 3]), we expect orbital 
Markov chains to improve the algorithms' performance 
when applied to problems that exhibit symmetries. 
For instance, sampling algorithms for statistical rela- 
tional languages are obvious candidates for improve- 
ment. Future work will include the integration of or- 
bital Markov chains with algorithms for marginal as 
well as maximum a-posteriori inference. We will also 
apply the symmetry detection approach to make exist- 
ing inference algorithms more efficient by, for instance, 
using symmetry breaking constraints in combinatorial 
optimization approaches to maximum a-posteriori in- 
ference in Markov logic networks (cf. [39, 32, 33]). 

While we have shown that permutation groups are 
computable with graph automorphism algorithms for 
a large class of models it is also possible to assume cer- 
tain symmetries in the model in the same way (condi- 
tional) independencies are assumed in the design stage 
of a probabilistic graphical model. Orbital Markov 
chains could easily incorporate these symmetries in 
form of permutation groups. 
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A Proof of Theorem 4.2 



B Proof of Theorem 4.6 



Wc first prove (a). Since M! is aperiodic we have, for 
each state x G Q and every time step t > 0, a non- 
zero probability for the Markov chain A4' to remain in 
state x at time t + 1. At each time t + 1, the orbital 
Markov chain transitions uniformly at random to one 
of the states in the orbit of the original chain's state at 
time t + 1. Since every state is an element of its own 
orbit, we have, for every state x G and every time 
step i > 0, a non-zero probability for the Markov chain 
M. to remain in state x at time t + 1. Hence, M. is 
aperiodic. The proof of statement (b) is accomplished 
in an analogous fashion and omitted. 

Let P(x, y) and P'(x, y) be the probabilities of Ai and 
Ai', respectively, to transition from state x to state 
y. Since n is a reversible distribution for Ai' we have 
that n(x)P'(x, y) = n(y)P'(y, x) for all states i,t/e!l. 
For every state x G £1 let x & be the orbit of x. Let 
®x '■= {q G 25 | x B — x} be the stabilizer subgroup of 
x with respect to 25. We have that 



see 



P'(x,y B ) 



\<S y >\P'(x,y') 
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,, £ P'(x,y') 



(1) 



= (\e\/\y e \) E p W) 

where the last two equalities follow from the 
orbit-stabilizer theorem. We will now prove that 
n(x)P(x,y) — n(y)P(y,x) for all states i,t/£ £1 By 
definition of the orbital Markov chain we have that 
*(x)P(x,y) = 7r(x)(l/\y*\)j: vl&ye P'(x,y') and, by 
equation (1), n(x)(l/\y \)Y^ y , ey e P'{ x iV') 

= n(x)(l/\ y & \)(\y & \/\®\)Y,P'(x,y B ) 

seas 

= n(x)(l/\&\) E^W) 
see 

= (1/|25|)5>(^W)- 

Since P' is reversible and 7r(x) = tt(x b ) for all 
G we have (1/|25|) £ fle0 7r(z)P'(z, y») = 

(V|25|)E 9ea5 ^2/ fl )^(y ^) 

^(y)(l/|25[)£ fle(5 PV^)- Now, since P'(ir,y) = 
P'(x B ,y s ) for all G f2 and all g G © by assump- 
tion, we have that n(y)(l/\&\) Ylgee p '(y a > x ) = 

n(y)(l/\<5\)J2 B<£@ P'(y,x B ) and, again by equa- 
tion (l),7r(y)(l/|<g|)£ gee P'(y i: rfl) 

= 7r(j,)(i/|(S|)(|(S|/|^|) p/ (y> x ') 

= n(y)(l/\x & \) E P'(y,x')=7r( 2/ )P(y,x). □ 



Let : x f2 — » N be the Hamming distance between 
any two elements in Q. Wc provide a path coupling 
argument on the set of pairs having Hamming distance 
1. Let X and Y be two independent sets which differ 
only at one vertex v with degree d. We assume, with- 
out loss of generality, that v G X \ Y . Choose a vertex 
w uniformly at random. We distinguish five cases: 

(i) if w = v then sample one g G 25 uniformly at 
random and let (X',Y') = (X s , X s ) with proba- 
bility A/(l + A), otherwise let {X', Y') = (Y B ,Y B ); 
Hence, H(X'.Y') = with probability 1. 

(ii) if w ^ v and w G X then sample one g G 25 
uniformly at random and let (X',Y') = ((X \ 
{w}) B , (Y\{w}) B ) with probability 1/(1+A), oth- 
erwise let (X',Y') = (X B ,Y B ). In both cases, we 
have that H(X',Y') = 1. 

(iii) if w v, w ^ X and w has no neighbor in X then 
sample one g G 25 uniformly at random and let 
(X', Y') = {(X U {w}) B , (Y U {w}) B ) with proba- 
bility A/(l+A), otherwise let (X' , Y ) — (X B , Y B )\ 
In both cases, we have that H(X',Y') = 1. 

(iv) if w 7^ v, w ^ X and w has a neighbor in X but 
not in Y, then sample one g G 25 uniformly at ran- 
dom. Let & := {g' G | X B = (Y U {w}) 3 '}. 
If 25/ 7^ then sample one <?' G 25' uniformly 
at random and let (X',Y') = (X a , (Y U {w}) B ) 
with probability A/(l + A). In this case we have 
H(X',Y') = 0. If & = then let (X',Y') = 
(X 8 , ( F U { w } ) fl ) with probability A/ ( 1 + A ) . Here 
we have H(X',Y') = 2. Otherwise let (X',Y') = 
(X B ,Y B ). Here, we have H(X',Y') = 1. 

(v) in all other cases sample one g G © uniformly at 
random and let (X' ,Y') = (X B ,Y B ). Here we 
have with probability 1 that H(X',Y') = 1. 

In summary, we have that 

E[H(X', Y') -!]<-- + g(2p - 1)- 



(1 + A) 

where g = Prjtu ^= v, w (£ X, and w has a neighbor in 
X but not in Y] and p = Pr[X <£ (Y U {w}) 85 w^t), 
w ^ X, and to has a neighbor in X but not in Y]. If 
p < 0.5 we have that E[H(X', Y')-l] < -i otherwise 
we have that E[iJ(X', F') - 1] < 

1 , A 1 A. ^ A 

h ~(2p- 1)t r-r < h — (2p- 1) 7 — . 

n n v p ; (1 + A)~ n n v p (1 + A) 

Hence, At (1(G)) mixes rapidly if either p < 0.5 or 
A((2p - 1)A - 1) < 1. For A((2p - 1)A - 1) = 1 
one can verify that there exists an a > such that 
Pr[H(X t+1 ,Y t+1 ) £ H(X U Y t )} > a for all t. □ 



